Noncentral F distribution (ncf)#
The noncentral F distribution is the distribution of many classical F-statistics under an alternative hypothesis. It generalizes the central F distribution by introducing a noncentrality parameter that captures “signal strength”.
Learning goals#
Understand the generative story: ratio of (noncentral) chi-square variables.
Relate
ncfto hypothesis testing (ANOVA / regression) and power analysis.Write down the PDF/CDF (as a Poisson mixture) and interpret parameters.
Derive mean/variance and compute higher moments (skewness/kurtosis) when they exist.
Implement a NumPy-only sampler and validate it with Monte Carlo.
Use
scipy.stats.ncfforpdf,cdf,rvs, andfit.
Prerequisites#
Chi-square and F distributions; expectation/variance
Some comfort with special functions (Beta, Gamma) and log-likelihoods
Notebook roadmap#
Title & Classification
Intuition & Motivation
Formal Definition
Moments & Properties
Parameter Interpretation
Derivations (Expectation, Variance, Likelihood)
Sampling & Simulation (NumPy-only)
Visualization (PDF, CDF, Monte Carlo)
SciPy Integration
Statistical Use Cases
Pitfalls
Summary
import numpy as np
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from scipy import optimize, special, stats
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
SEED = 7
rng = np.random.default_rng(SEED)
import sys
import scipy
import plotly
print("Python:", sys.version.split()[0])
print("NumPy:", np.__version__)
print("SciPy:", scipy.__version__)
print("Plotly:", plotly.__version__)
print("Seed:", SEED)
Python: 3.12.9
NumPy: 1.26.2
SciPy: 1.15.0
Plotly: 6.5.2
Seed: 7
1) Title & Classification#
Name:
ncf(Noncentral F distribution; SciPy:scipy.stats.ncf)Type: Continuous
Support (standard form): \(x \in (0,\infty)\)
Parameter space (standard form):
numerator degrees of freedom \(\nu_1 > 0\) (SciPy:
dfn)denominator degrees of freedom \(\nu_2 > 0\) (SciPy:
dfd)noncentrality \(\lambda \ge 0\) (SciPy:
nc)
SciPy location/scale:
loc \in \mathbb{R},scale > 0with $\(X = \text{loc} + \text{scale}\,Y,\qquad Y \sim \mathrm{NCF}(\nu_1,\nu_2,\lambda).\)$
Unless stated otherwise, we work with the standard form (loc=0, scale=1).
Notation you may see:
\(X \sim F_{\text{nc}}(\nu_1,\nu_2,\lambda)\) (noncentral F)
sometimes \(\nu_1\) / \(\nu_2\) are written as
df1/df2
2) Intuition & Motivation#
What it models#
The central F distribution arises as a ratio of two independent variance estimates. The noncentral F appears when the numerator contains a signal term (e.g., the model explains real variation).
A canonical setting is the classical linear model with Gaussian noise:
Under \(H_0\) (no signal), an F-statistic has a central F distribution.
Under \(H_1\) (signal present), the same statistic has a noncentral F distribution with noncentrality \(\lambda\).
Typical real-world use cases#
Power analysis for ANOVA and regression F-tests
Design of experiments: sample size vs detectable effect size
Model comparison (nested models): distribution of the test statistic under alternatives
Relations to other distributions#
Central F: \(\lambda=0\) recovers \(F(\nu_1,\nu_2)\).
Noncentral chi-square: the noncentral F is a ratio involving a noncentral chi-square random variable.
Noncentral t: if \(T \sim \mathrm{nct}(\nu, \delta)\) then \(T^2 \sim \mathrm{ncf}(1,\nu,\delta^2)\).
Poisson mixture: \(\mathrm{ncf}\) can be written as a Poisson mixture of central F distributions (very useful computationally).
3) Formal Definition#
Generative definition#
Let
\(U \sim \chi'^2_{\nu_1}(\lambda)\) be a noncentral chi-square random variable with \(\nu_1\) degrees of freedom and noncentrality \(\lambda\).
\(V \sim \chi^2_{\nu_2}\) be an independent central chi-square random variable.
Define $\( X \,=\, \frac{U/\nu_1}{V/\nu_2} \,=\, \frac{\nu_2}{\nu_1}\,\frac{U}{V}. \)\( Then \)X \sim \mathrm{NCF}(\nu_1,\nu_2,\lambda)$.
Central F building block#
For a central F random variable \(Y \sim F(d_1,d_2)\), \(y>0\):
PDF $\( f_F(y; d_1,d_2) = \frac{1}{B\left(\tfrac{d_1}{2},\tfrac{d_2}{2}\right)}\left(\frac{d_1}{d_2}\right)^{d_1/2} \frac{y^{d_1/2-1}}{\left(1+\tfrac{d_1}{d_2}y\right)^{(d_1+d_2)/2}}. \)$
CDF $\( F_F(y; d_1,d_2) = I_{z}(\tfrac{d_1}{2},\tfrac{d_2}{2}),\qquad z = \frac{d_1 y}{d_1 y + d_2}, \)\( where \)I_z(a,b)$ is the regularized incomplete beta function.
Poisson-mixture representation (PDF and CDF)#
A key identity is that a noncentral chi-square is a Poisson mixture of central chi-squares:
Let $\(w_k = \mathbb{P}(K=k) = e^{-\lambda/2}\,\frac{(\lambda/2)^k}{k!}.\)$
Then \(X\) is a Poisson mixture of central F distributions:
PDF $\( f_X(x;\nu_1,\nu_2,\lambda) = \sum_{k=0}^{\infty} w_k\, f_F\bigl(x; \nu_1+2k,\nu_2\bigr),\qquad x>0. \)$
CDF $\( F_X(x;\nu_1,\nu_2,\lambda) = \sum_{k=0}^{\infty} w_k\, I_{z}\bigl(\tfrac{\nu_1}{2}+k,\tfrac{\nu_2}{2}\bigr), \qquad z = \frac{\nu_1 x}{\nu_1 x + \nu_2}. \)$
These series are the most common “closed forms” used in practice.
def validate_ncf_params(dfn: float, dfd: float, nc: float):
dfn = float(dfn)
dfd = float(dfd)
nc = float(nc)
if not (dfn > 0):
raise ValueError(f"dfn must be > 0, got {dfn!r}")
if not (dfd > 0):
raise ValueError(f"dfd must be > 0, got {dfd!r}")
if not (nc >= 0):
raise ValueError(f"nc must be >= 0, got {nc!r}")
return dfn, dfd, nc
def f_logpdf(x, dfn: float, dfd: float):
"""Log-PDF of the central F(dfn, dfd) distribution (standard form)."""
dfn = float(dfn)
dfd = float(dfd)
x = np.asarray(x, dtype=float)
out = np.full_like(x, -np.inf, dtype=float)
mask = x > 0
xm = x[mask]
a = 0.5 * dfn
b = 0.5 * dfd
logc = a * np.log(dfn / dfd) - special.betaln(a, b)
out[mask] = logc + (a - 1.0) * np.log(xm) - (a + b) * np.log1p((dfn / dfd) * xm)
return out
def f_pdf(x, dfn: float, dfd: float):
return np.exp(f_logpdf(x, dfn, dfd))
def f_cdf(x, dfn: float, dfd: float):
"""CDF of the central F(dfn, dfd) distribution (standard form)."""
dfn = float(dfn)
dfd = float(dfd)
x = np.asarray(x, dtype=float)
out = np.zeros_like(x, dtype=float)
mask = x > 0
xm = x[mask]
z = (dfn * xm) / (dfn * xm + dfd)
out[mask] = special.betainc(0.5 * dfn, 0.5 * dfd, z)
return out
# Quick sanity check vs SciPy's central F
x_demo = np.linspace(0.1, 3.0, 5)
print("max |pdf - scipy|:", np.max(np.abs(f_pdf(x_demo, 5, 20) - stats.f.pdf(x_demo, 5, 20))))
print("max |cdf - scipy|:", np.max(np.abs(f_cdf(x_demo, 5, 20) - stats.f.cdf(x_demo, 5, 20))))
max |pdf - scipy|: 2.275957200481571e-15
max |cdf - scipy|: 3.3306690738754696e-16
def poisson_weights_trunc(lam: float, tol: float = 1e-12, k_max: int = 10_000):
"""Truncate Poisson(lam) weights so that remaining tail mass is < tol."""
lam = float(lam)
if lam < 0:
raise ValueError("lam must be >= 0")
if lam == 0.0:
return np.array([1.0]), 1.0
weights = np.empty(k_max + 1, dtype=float)
weights[0] = np.exp(-lam)
mass = weights[0]
k = 0
# w_{k+1} = w_k * lam/(k+1)
while k < k_max and (1.0 - mass) > tol:
k += 1
weights[k] = weights[k - 1] * lam / k
mass += weights[k]
return weights[: k + 1], mass
def ncf_pdf_series(x, dfn: float, dfd: float, nc: float, tol: float = 1e-12, k_max: int = 10_000):
"""PDF of ncf(dfn, dfd, nc) via Poisson-mixture series truncation."""
dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
x = np.asarray(x, dtype=float)
weights, mass = poisson_weights_trunc(nc / 2.0, tol=tol, k_max=k_max)
out = np.zeros_like(x, dtype=float)
for k, wk in enumerate(weights):
out += wk * f_pdf(x, dfn + 2 * k, dfd)
# Optional: renormalize because we truncated the Poisson tail.
return out / mass
def ncf_cdf_series(x, dfn: float, dfd: float, nc: float, tol: float = 1e-12, k_max: int = 10_000):
"""CDF of ncf(dfn, dfd, nc) via Poisson-mixture series truncation."""
dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
x = np.asarray(x, dtype=float)
weights, mass = poisson_weights_trunc(nc / 2.0, tol=tol, k_max=k_max)
out = np.zeros_like(x, dtype=float)
for k, wk in enumerate(weights):
out += wk * f_cdf(x, dfn + 2 * k, dfd)
return out / mass
# Compare series approximation to SciPy (moderate parameters)
dfn, dfd, nc = 5, 20, 10
x_grid = np.linspace(0.001, 6.0, 400)
pdf_scipy = stats.ncf.pdf(x_grid, dfn, dfd, nc)
cdf_scipy = stats.ncf.cdf(x_grid, dfn, dfd, nc)
pdf_series = ncf_pdf_series(x_grid, dfn, dfd, nc, tol=1e-12)
cdf_series = ncf_cdf_series(x_grid, dfn, dfd, nc, tol=1e-12)
print("max |pdf_series - pdf_scipy|:", float(np.max(np.abs(pdf_series - pdf_scipy))))
print("max |cdf_series - cdf_scipy|:", float(np.max(np.abs(cdf_series - cdf_scipy))))
max |pdf_series - pdf_scipy|: 0.8056810852034035
max |cdf_series - cdf_scipy|: 0.6692626621651225
4) Moments & Properties#
Existence of moments#
From the ratio representation \(X = (U/\nu_1)/(V/\nu_2)\), negative moments of \(V\sim\chi^2_{\nu_2}\) appear. As a result:
The \(r\)-th raw moment \(\mathbb{E}[X^r]\) exists only if \(\nu_2 > 2r\).
mean exists if \(\nu_2 > 2\)
variance exists if \(\nu_2 > 4\)
skewness needs \(\nu_2 > 6\)
kurtosis needs \(\nu_2 > 8\)
Mean and variance (closed form)#
Let \(X \sim \mathrm{NCF}(\nu_1,\nu_2,\lambda)\).
Mean (for \(\nu_2>2\)): $\( \mathbb{E}[X] = \frac{\nu_2}{\nu_2-2}\left(1+\frac{\lambda}{\nu_1}\right) = \frac{\nu_2(\nu_1+\lambda)}{\nu_1(\nu_2-2)}. \)$
Variance (for \(\nu_2>4\)): $\( \mathrm{Var}(X) = \frac{2\,(\nu_2/\nu_1)^2\left[(\nu_2-2)(\nu_1+2\lambda) + (\nu_1+\lambda)^2\right]}{(\nu_2-2)^2(\nu_2-4)}. \)$
These reduce to the familiar central-F formulas when \(\lambda=0\).
Skewness and kurtosis#
Closed forms exist but get algebraically bulky. A practical approach is:
compute raw moments \(\mathbb{E}[X],\dots,\mathbb{E}[X^4]\) (when they exist),
convert to central moments, then to skewness/kurtosis.
MGF / characteristic function#
The MGF \(M_X(t)=\mathbb{E}[e^{tX}]\) does not exist for any \(t>0\) (polynomial right tail, like the central F).
The Laplace transform \(\mathbb{E}[e^{tX}]\) exists for \(t<0\) and can be computed numerically.
The characteristic function \(\varphi_X(t)=\mathbb{E}[e^{itX}]\) exists for all real \(t\).
Entropy#
There is no simple elementary closed form in general.
In practice, use numerical methods such as scipy.stats.ncf.entropy (which evaluates it numerically).
import math
def ncx2_raw_moments(dof: float, nc: float, max_order: int = 4):
"""Raw moments E[U^r] for U ~ noncentral chi-square(dof, nc), for r=1..max_order.
Uses cumulants from log-MGF and converts cumulants -> raw moments.
"""
dof = float(dof)
nc = float(nc)
if max_order > 4:
raise ValueError("This helper is implemented up to order 4.")
# cumulant κ_r = 2^{r-1} (r-1)! (dof + r*nc)
kappa = [None]
for r in range(1, max_order + 1):
kappa_r = (2 ** (r - 1)) * math.factorial(r - 1) * (dof + r * nc)
kappa.append(kappa_r)
k1 = kappa[1]
if max_order == 1:
return np.array([k1])
k2 = kappa[2]
m1 = k1
m2 = k2 + k1**2
if max_order == 2:
return np.array([m1, m2])
k3 = kappa[3]
m3 = k3 + 3 * k2 * k1 + k1**3
if max_order == 3:
return np.array([m1, m2, m3])
k4 = kappa[4]
m4 = k4 + 4 * k3 * k1 + 3 * (k2**2) + 6 * k2 * (k1**2) + k1**4
return np.array([m1, m2, m3, m4])
def chisquare_neg_moment(dof: float, r: int):
"""E[V^{-r}] for V ~ chi-square(dof). Requires dof > 2r."""
dof = float(dof)
r = int(r)
if dof <= 2 * r:
return np.nan
k = 0.5 * dof
theta = 2.0
# For Gamma(k, theta): E[V^{-r}] = theta^{-r} * Γ(k-r)/Γ(k)
return (theta ** (-r)) * special.gamma(k - r) / special.gamma(k)
def ncf_raw_moments(dfn: float, dfd: float, nc: float):
"""Raw moments E[X^r] for X ~ ncf(dfn, dfd, nc), r=1..4 when they exist."""
dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
mU = ncx2_raw_moments(dfn, nc, max_order=4)
out = []
for r in range(1, 5):
ev_inv = chisquare_neg_moment(dfd, r)
if not np.isfinite(ev_inv):
out.append(np.nan)
continue
out.append(((dfd / dfn) ** r) * mU[r - 1] * ev_inv)
return np.array(out, dtype=float)
def raw_to_central(m1, m2, m3, m4):
"""Convert raw moments to central moments (μ2, μ3, μ4)."""
mu = m1
mu2 = m2 - mu**2
mu3 = m3 - 3 * mu * m2 + 2 * mu**3
mu4 = m4 - 4 * mu * m3 + 6 * mu**2 * m2 - 3 * mu**4
return mu2, mu3, mu4
def ncf_mean_closed(dfn: float, dfd: float, nc: float):
dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
if dfd <= 2:
return np.nan
return dfd * (dfn + nc) / (dfn * (dfd - 2))
def ncf_var_closed(dfn: float, dfd: float, nc: float):
dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
if dfd <= 4:
return np.nan
num = 2 * (dfd / dfn) ** 2 * ((dfd - 2) * (dfn + 2 * nc) + (dfn + nc) ** 2)
den = (dfd - 2) ** 2 * (dfd - 4)
return num / den
dfn, dfd, nc = 5, 20, 10
m1, m2, m3, m4 = ncf_raw_moments(dfn, dfd, nc)
mu2, mu3, mu4 = raw_to_central(m1, m2, m3, m4)
skew = mu3 / (mu2 ** 1.5)
excess_kurt = mu4 / (mu2**2) - 3.0
print("Raw moments E[X^r], r=1..4:", np.array([m1, m2, m3, m4]))
print("Mean (closed):", ncf_mean_closed(dfn, dfd, nc))
print("Var (closed):", ncf_var_closed(dfn, dfd, nc))
print("Skewness:", skew)
print("Excess kurtosis:", excess_kurt)
mvsk = stats.ncf.stats(dfn, dfd, nc, moments="mvsk")
print("SciPy mvsk:", mvsk)
Raw moments E[X^r], r=1..4: [ 3.3333 15.2778 93.7302 765. ]
Mean (closed): 3.3333333333333335
Var (closed): 4.166666666666667
Skewness: 1.766743077969323
Excess kurtosis: 6.412571428571438
SciPy mvsk: (3.3333333333333335, 4.166666666666667, 1.7667430779693272, 6.412571428571429)
5) Parameter Interpretation#
Degrees of freedom \(\nu_1\) (numerator)#
Comes from the dimension of the signal/constraint being tested.
Larger \(\nu_1\) generally makes the distribution less “spiky” near 0 and less variable.
Degrees of freedom \(\nu_2\) (denominator)#
Often corresponds to residual degrees of freedom.
Controls tail heaviness and moment existence: small \(\nu_2\) implies very heavy tails.
Noncentrality \(\lambda\)#
Encodes how far from the null you are.
In many tests, \(\lambda\) is proportional to sample size × effect size².
Increasing \(\lambda\) shifts mass to the right (larger typical F-statistics) and increases power.
Below we visualize how the PDF changes as we vary parameters.
def plot_pdf_family(dfn: float, dfd: float, ncs, x_max: float = 6.0):
x = np.linspace(0.001, x_max, 600)
fig = go.Figure()
for nc in ncs:
fig.add_trace(
go.Scatter(x=x, y=stats.ncf.pdf(x, dfn, dfd, nc), mode="lines", name=f"nc={nc}")
)
fig.update_layout(
title=f"ncf PDF family (dfn={dfn}, dfd={dfd})",
xaxis_title="x",
yaxis_title="pdf(x)",
width=900,
height=450,
)
fig.show()
plot_pdf_family(dfn=5, dfd=20, ncs=[0, 2, 5, 10, 20], x_max=8.0)
6) Derivations#
6.1 Expectation#
Using \(X = \frac{\nu_2}{\nu_1}\frac{U}{V}\) with independent \(U\) and \(V\):
For \(U\sim\chi'^2_{\nu_1}(\lambda)\): $\(\mathbb{E}[U] = \nu_1 + \lambda.\)$
For \(V\sim\chi^2_{\nu_2}\) (a Gamma distribution), for \(\nu_2>2\): $\(\mathbb{E}[V^{-1}] = \frac{1}{\nu_2-2}.\)$
Putting it together: $\( \mathbb{E}[X] = \frac{\nu_2(\nu_1+\lambda)}{\nu_1(\nu_2-2)}. \)$
6.2 Variance#
Similarly, $\( \mathbb{E}[X^2] = \left(\frac{\nu_2}{\nu_1}\right)^2 \mathbb{E}[U^2] \, \mathbb{E}[V^{-2}]. \)$
For \(\nu_2>4\), $\(\mathbb{E}[V^{-2}] = \frac{1}{(\nu_2-2)(\nu_2-4)}.\)$
For \(U\sim\chi'^2_{\nu_1}(\lambda)\), $\( \mathrm{Var}(U)=2(\nu_1+2\lambda) \quad\Rightarrow\quad \mathbb{E}[U^2] = \mathrm{Var}(U) + \mathbb{E}[U]^2. \)$
Then \(\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2\) yields the closed form from Section 4.
6.3 Likelihood#
Given data \(x_1,\dots,x_n\) assumed i.i.d. from \(\mathrm{NCF}(\nu_1,\nu_2,\lambda)\), the likelihood is
Because \(f_X\) is a series (or special-function expression), MLE is typically done numerically with constraints \(\nu_1>0,\nu_2>0,\lambda\ge 0\).
# Likelihood demo: estimate nc with dfn/dfd fixed
dfn_true, dfd_true, nc_true = 5, 20, 10
n = 2_000
x = stats.ncf.rvs(dfn_true, dfd_true, nc_true, size=n, random_state=rng)
def neg_loglik_nc(nc: float) -> float:
if nc < 0:
return np.inf
return float(-np.sum(stats.ncf.logpdf(x, dfn_true, dfd_true, nc)))
res = optimize.minimize_scalar(neg_loglik_nc, bounds=(0.0, 40.0), method="bounded")
print("True nc:", nc_true)
print("Estimated nc (MLE, dfn/dfd fixed):", float(res.x))
print("Optimization success:", res.success)
True nc: 10
Estimated nc (MLE, dfn/dfd fixed): 9.982072333858543
Optimization success: True
7) Sampling & Simulation (NumPy-only)#
A clean NumPy-only sampler comes directly from the Poisson-mixture construction.
Draw \(K \sim \mathrm{Poisson}(\lambda/2)\).
Draw \(U \sim \chi^2_{\nu_1+2K}\) (central chi-square).
Draw \(V \sim \chi^2_{\nu_2}\).
Return \(X = (U/\nu_1)/(V/\nu_2)\).
This works because a noncentral chi-square is a Poisson mixture of central chi-squares.
def ncf_rvs_numpy(rng: np.random.Generator, dfn: float, dfd: float, nc: float, size):
"""Sample from ncf(dfn, dfd, nc) using NumPy only (Poisson-mixture sampler)."""
dfn, dfd, nc = validate_ncf_params(dfn, dfd, nc)
lam = 0.5 * nc
k = rng.poisson(lam, size=size)
# If df is an array, Generator.chisquare broadcasts and returns the same shape.
u = rng.chisquare(dfn + 2.0 * k)
v = rng.chisquare(dfd, size=size)
return (u / dfn) / (v / dfd)
dfn, dfd, nc = 5, 20, 10
samples = ncf_rvs_numpy(rng, dfn, dfd, nc, size=200_000)
print("Sample mean:", float(np.mean(samples)))
print("Theory mean:", ncf_mean_closed(dfn, dfd, nc))
print("Sample var:", float(np.var(samples)))
print("Theory var:", ncf_var_closed(dfn, dfd, nc))
Sample mean: 3.329072332845629
Theory mean: 3.3333333333333335
Sample var: 4.114308408422413
Theory var: 4.166666666666667
8) Visualization#
We’ll visualize:
the PDF and CDF (SciPy vs the truncated-series implementation)
Monte Carlo samples (histogram and empirical CDF)
dfn, dfd, nc = 5, 20, 10
# Monte Carlo samples
n = 50_000
s = ncf_rvs_numpy(rng, dfn, dfd, nc, size=n)
# Grids
x = np.linspace(0.001, 8.0, 700)
pdf_scipy = stats.ncf.pdf(x, dfn, dfd, nc)
cdf_scipy = stats.ncf.cdf(x, dfn, dfd, nc)
pdf_series = ncf_pdf_series(x, dfn, dfd, nc)
cdf_series = ncf_cdf_series(x, dfn, dfd, nc)
# Empirical CDF
s_sorted = np.sort(s)
ecdf_y = (np.arange(1, n + 1) / n).astype(float)
fig = make_subplots(rows=1, cols=2, subplot_titles=("PDF", "CDF"))
# PDF panel
fig.add_trace(
go.Scatter(x=x, y=pdf_scipy, mode="lines", name="SciPy pdf"), row=1, col=1
)
fig.add_trace(
go.Scatter(x=x, y=pdf_series, mode="lines", name="Series pdf", line=dict(dash="dash")),
row=1,
col=1,
)
fig.add_trace(
go.Histogram(
x=s,
nbinsx=80,
histnorm="probability density",
name="MC hist",
opacity=0.35,
),
row=1,
col=1,
)
# CDF panel
fig.add_trace(
go.Scatter(x=x, y=cdf_scipy, mode="lines", name="SciPy cdf"), row=1, col=2
)
fig.add_trace(
go.Scatter(x=x, y=cdf_series, mode="lines", name="Series cdf", line=dict(dash="dash")),
row=1,
col=2,
)
fig.add_trace(
go.Scatter(
x=s_sorted,
y=ecdf_y,
mode="lines",
name="Empirical CDF",
line=dict(width=1),
),
row=1,
col=2,
)
fig.update_layout(
title=f"Noncentral F: dfn={dfn}, dfd={dfd}, nc={nc}",
width=1100,
height=450,
)
fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="density", row=1, col=1)
fig.update_yaxes(title_text="CDF", row=1, col=2)
fig.show()
9) SciPy Integration#
SciPy provides a full-featured implementation: scipy.stats.ncf.
Common methods:
pdf,logpdfcdf,sf(survival function),logsfppf(quantiles)rvs(sampling)stats(moments='mvsk')entropy(numerical)scipy.stats.fitfor MLE with bounds/constraints
Note: because ncf has multiple parameters (plus loc/scale), unconstrained fitting can be sensitive.
ncf = stats.ncf
dfn, dfd, nc = 5, 20, 10
x0 = 1.5
print("pdf(x0):", float(ncf.pdf(x0, dfn, dfd, nc)))
print("cdf(x0):", float(ncf.cdf(x0, dfn, dfd, nc)))
s = ncf.rvs(dfn, dfd, nc, size=5, random_state=rng)
print("rvs:", s)
print("mvsk:", ncf.stats(dfn, dfd, nc, moments="mvsk"))
print("entropy:", float(ncf.entropy(dfn, dfd, nc)))
# Fitting example using scipy.stats.fit (global optimization by default)
data = ncf.rvs(dfn, dfd, nc, size=400, random_state=rng)
fit_res = stats.fit(
ncf,
data,
bounds={
# Fix loc/scale to standard form
"loc": (0.0, 0.0),
"scale": (1.0, 1.0),
# Either estimate or constrain shape parameters
"dfn": (dfn, dfn),
"dfd": (dfd, dfd),
"nc": (0.0, 50.0),
},
)
print(fit_res)
print("nc_hat:", float(fit_res.params.nc))
pdf(x0): 0.22501760747977675
cdf(x0): 0.14848821017085254
rvs: [2.4224 2.2932 2.5741 1.5943 3.1304]
mvsk: (3.3333333333333335, 4.166666666666667, 1.7667430779693272, 6.412571428571429)
entropy: 1.9476495499863073
params: FitParams(dfn=5.0, dfd=20.0, nc=9.966537234116151, loc=0.0, scale=1.0)
success: True
message: 'Optimization terminated successfully.'
nc_hat: 9.966537234116151
10) Statistical Use Cases#
10.1 Hypothesis testing: power of an F-test#
If an F-statistic has distribution:
under \(H_0\): \(F \sim F(\nu_1,\nu_2)\)
under \(H_1\): \(F \sim F_{\text{nc}}(\nu_1,\nu_2,\lambda)\)
then for significance level \(\alpha\):
critical value \(f_{\text{crit}} = F^{-1}_{\nu_1,\nu_2}(1-\alpha)\)
power \(= \mathbb{P}(F > f_{\text{crit}} \mid H_1) = 1 - F_{\text{nc}}(f_{\text{crit}};\nu_1,\nu_2,\lambda)\).
10.2 Bayesian modeling: uncertain effect size#
In Bayesian design / power analysis, you might place a prior on an effect size parameter that implies a prior over \(\lambda\). Then the expected power is an average over that prior.
10.3 Generative modeling#
The noncentral F is a flexible positive distribution for ratios; it can be used as a generative component when the process naturally looks like “signal+noise over noise”. Often, the most useful role is modeling (or simulating) the distribution of a test statistic under plausible alternatives.
# Power curve as a function of the noncentrality parameter
dfn, dfd = 5, 20
alpha = 0.05
f_crit = stats.f.ppf(1 - alpha, dfn, dfd)
nc_grid = np.linspace(0.0, 40.0, 300)
power = stats.ncf.sf(f_crit, dfn, dfd, nc_grid)
fig = go.Figure()
fig.add_trace(go.Scatter(x=nc_grid, y=power, mode="lines", name="power"))
fig.update_layout(
title=f"Power vs noncentrality (dfn={dfn}, dfd={dfd}, α={alpha}, f_crit={f_crit:.3f})",
xaxis_title="noncentrality nc",
yaxis_title="power = P(F > f_crit | H1)",
width=900,
height=450,
yaxis=dict(range=[0, 1]),
)
fig.show()
# Bayesian-flavored: prior over nc -> distribution of power
# (Here we use a simple Gamma prior on nc for illustration.)
prior_nc = rng.gamma(shape=2.0, scale=5.0, size=50_000) # mean 10
prior_power = stats.ncf.sf(f_crit, dfn, dfd, prior_nc)
print("E_prior[power]:", float(np.mean(prior_power)))
print("90% prior interval for power:", np.quantile(prior_power, [0.05, 0.95]))
E_prior[power]: 0.49195779903889264
90% prior interval for power: [0.121 0.9291]
11) Pitfalls#
Invalid parameters: require
dfn > 0,dfd > 0,nc >= 0.Moment existence: mean requires
dfd > 2, variancedfd > 4, skewnessdfd > 6, kurtosisdfd > 8.Numerical issues:
The PDF/CDF involve infinite series or special functions; naive truncation can be inaccurate for extreme parameters.
Prefer
scipy.stats.ncf.logpdf,logcdf,sf,logsfin tail computations.For very large
nc, the Poisson mixture may need many terms.
Fitting can be unstable:
Many parameters (including
loc/scale) can lead to identifiability issues.Constrain/fix parameters when you have domain knowledge (e.g.,
loc=0,scale=1).
12) Summary#
ncfis the noncentral analogue of the F distribution and describes many F-statistics under alternatives.It can be defined as a ratio of an independent noncentral chi-square and central chi-square.
The PDF/CDF are naturally expressed as a Poisson mixture of central F distributions.
Mean/variance have clean closed forms (when
dfdis large enough); higher moments require stronger conditions.A simple NumPy-only sampler uses the Poisson-mixture construction.
For practical work (tails, fitting, entropy), rely on
scipy.stats.ncfand prefer log-domain functions.